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Improved estimation of hydrometeorological states from down-sampled observations and 
background model forecasts in a noisy environment, has been a subject of growing research in 
the past decades. Here, we introduce a unified framework that ties together the problems of 
downscaling, data fusion and data assimilation as ill-posed inverse problems. This framework 
seeks solutions beyond the classic least squares estimation paradigms by imposing proper reg- 
ularizations, which are constraints consistent with the degree of smoothness and probabilistic 
structure of the underlying state. We review relevant regularization methods in derivative 
space and extend classic formulations of the aforementioned problems with particular empha- 
sis on hydrologic and atmospheric applications. Informed by the statistical characteristics of 
the state variable of interest, the central results of the paper suggest that proper regulariza- 
tion can lead to a more accurate and stable recovery of the true state and hence more skillful 
forecasts. In particular, using the Tikhonov and Huber regularizations in the derivative space, 
the promise of the proposed framework is demonstrated in static downscaling and fusion of 
synthetic multi-sensor precipitation data, while a data assimilation numerical experiment is 
presented using the heat equation in a variational setting. 



1. Introduction 



In parallel to the growing technologies for earth remote sensing, we have witnessed an increasing interest to 
improve the accuracy of observations and integrate them with physical models for more accurate estimates 
and forecasts of environmental states. Remote sensing observations are typically noisy and coarse-scale 
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representations of a true state variable of interest, lacking sufficient accuracy and resolution for fine- 
scale environmental modeling. In addition, environmental predictions are not perfect as models often 
suffer either from inadequate characterization of the underlying physics or their initial conditions. Given 
these limitations, several classes of estimation problems present themselves as continuous challenges for 
the atmospheric, hydrologic and oceanic science communities. These include: (1) Downscaling which 
refers to the class of problems to enhance the resolution of a measured field or produce a fine-scale 
representation of a coarse-scale modeled field; (2) Data fusion, to produce an improved estimate from a 
suite of noisy observations at different scales; and (3) Data assimilation which deals with estimating initial 
conditions in a predictive model consistent with the available noisy observations and the model dynamics. 
In this paper, we revisit the problems of downscaling (DS), data fusion (DF), and data assimilation (DA) 
focusing on a common thread between them as inverse estimation problems. Proper regularizations and 
solution methods are proposed to efficiently handle large-scale data sets while preserving key statistical 
and geometrical properties of the underlying fields of interest. Here, we only examine hydrometeorological 
problems with particular emphasis on land-surface applications. 

In land surface hydrologic studies, downscaling of precipitation and soil moisture signals has received 
considerable attention, using a relatively wide range of methodologies. DS methods in hydrometeorology 
and climate studies generally fall into three main categories namely, dynamic downscaling, statistical 
downscaling, and variational downscaling. Dynamic downscaling often uses a regional physical model 
to reproduce fine-scale details of the state of interest consistent with the large-scale scale observations 



or outputs of a global circulation model [e.g., Reichle et al, 2001a; Castro et a/., 2005; Zupanski et al. 



2010| . Statistical downscaling methods encompass a large group of methods that typically use empirical 
multiscale statistical relationships, parameterized by observations or other environmental predictors, to 
reproduce realizations of fine-scale fields. Precipitation and soil moisture statistical downscaling has been 
mainly approached via spectral and (multi) fractal interpolation methods, capitalizing on the presence 
of a power law spectrum and a statistical self-similarity/self- affinity in precipitation and soil moisture 



fields [among others, Lovejoy and Mandelbrot, 1985; Lovejoy and Schertzer, 1990; Gupta and Waymire 



1993{ Kumar and Foufoula- G eorgiou , [T993a|b ; Perica and Foufoula-Georgiou\\1996]Veneziano et a/.[|1996[ 
Wilby et al\ |1998b|a[ \DeUda\ [20001 \Kim and Bafws\ [20021 \Rebora et al\ [20051 \Badas et al\^me[\Merlin 



et al, 2006 1 . In variational approaches, a direct cost function is defined whose optimal point is the desired 



fine-scale field which can be obtained via using an optimization routine. Recently along this direction, 
Ebtehaj et al |2012| cast the rainfall DS problem as an inverse problem using sparse regularization to 
address the intrinsic rainfall singularities and non-Gaussian statistics. This variational approach belongs 
to the class of methodologies presented and extended in this paper. 

The DF problem has also been a subject of continuous interest in the precipitation science community 
mainly due to the availability of rainfall measurements from multiple spaceborne (e.g., TRMM and GOES 
satellites) and ground-based sensors (e.g., the NEXRAD network). The accuracy and space-time coverage 
of remotely sensed rainfall are typically conjugate variables. In other words, more accurate observations 
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are often available with lower space-time coverage and vice verse. For instance, low-orbit microwave 
sensors provide more accurate with less space-time coverage compared to the high-orbit geo-stationary 
infrared (GOES-IR) sensors. Moreover, there are often, multiple instruments on a single satellite platform 
(e.g., precipitation radar and microwave imager on TRMM), each of which measures rainfall with differ- 
ent footprints and resolutions. A wide range of methodologies including weighted averaging, regression, 
filtering, and neural networks has been applied to combine microwave and Geo-IR rainfall signals [e.g., 
Adler et al\ [20031 \Huffman et al\ [T9951 \Sorooshian et al.\ [20001 \Huffrnan et al\ |200T| \Hong eTcd] [2004 



Huffman et al } 20071. Furthermore, a few studies have addressed methodologies to optimally combine 



the products of the TRMM precipitation radar (PR) with the TRMM microwave imager (TMI) using 
Bayesian inversion and weighted least squares approaches [e.g., Masunaga and Kummerow\ |2005 ; \Kum 



merow et a/., 20101. From another direction, Gaussian filtering methods on Markovian tree-like structures, 



the so-called scale-recursive-estimation (SRE), have been proposed to merge spaceborne and ground-based 
rainfall observations at multiple scales [e.g. , \G Orenburg et a/. , 2001; Tustison et a/., 2003; Bocchiola, 2007 



Van de Vyver and Roulin, 2009; Wang et a/., 2011] , see also \Kumar , 1999] for soil moisture application. 



Recently, using the Gaussian scale mixture probability model and an adaptive filtering approach \Ebtehaj 



and Foufoula-Georgiou |2011b| proposed a fusion methodology in the wavelet domain to merge TRMM- 



PR and ground-based NEXRAD measurements, aiming to preserve the non-Gaussian structure and local 
extremes of precipitation fields. 

Data assimilation has played an important role in improving the skill of environmental forecasts and has 
become by now a necessary step in operational prediction models [see Daley, 1993 1. Data assimilation 
amounts to integrating the underlying knowledge from the observations into the first guess or the back- 
ground state, typically provided by a physical model. The goal is then to obtain the best estimate of the 
current state of the system, the so called analysis. The analysis with reduced error is then used to forecast 
the state at the next time step and so on [see Daley, 1993; Kalnay , 2003, for a comprehensive review]. 
Note that, in practice, the present background state, used in each analysis cycle, is commonly a forecast 
of the state by the underlying model, initialized by the analysis state in the previous time step. One 
of the most common approaches to the data assimilation problem relies on variational techniques [e.g., 
Sasaki , 1958 , 1970[ \Lorenc , 1986[ 'Talagra nd and Courtier\ 1987[ Courtier and Talagrand\ |1990[ \Parrish 



and Derber, 1992; Zupanski, 1993; Courtier et al, 1994; Reichle et a/., 2001b, among many others]. In 



these methods, one explicitly defines a cost function, typically quadratic, whose potentially unique min- 
imizer is the analysis state. Very recently Freitag et al. |2012| proposed a regularized data assimilation 
scheme to improve assimilation results in advective sharp fluid fronts. 

What is common in the DS, DF, and DA problems is that, in all of them, we seek an improved estimate 
of the true state given a suite of noisy and down-sampled observations and/or uncertain model-predicted 
states. Specifically, let us suppose that the unknown true state in continuous space is denoted by x(t) and 
its indirect observation (or model-output), by y(r). Let us also assume that x(t) and y(r) are related via 
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a linear integral equation, called the Fredholm integral equation of the the first kind, as: 

[ 1 H(r,t)x(t)dt = y(r), < r < 1, (1) 
Jo 

where ?^(r, t) is the kernel relating x(t) and y{r). Recovery of x(t) knowing y(r) and 7^(r, t) is a classic 
linear inverse problem. Clearly, the deconvolution problem is a very special case with the kernel of the 
form %{r — £), which in its discrete form, plays a central role in this paper. Linear inverse problems are 
by nature ill-posed, in the sense that they do not satisfy at least one of the following three conditions: 
(1) Existence; (2) Uniqueness; and (3) Stability of the solution. For instance, when due to the kernel 
architecture, the dimension of the observation is smaller than that of the true signal, infinite choices of 
x{t) lead to the same y(r) and there is no unique solution for the problem. For the case when y{r) is 
noisy and has a larger dimension than the true state, the solution is typically very unstable, because, the 
high frequency components in y(r) are typically amplified and spoil the solution. In fact, the higher the 
frequency, the larger the amplification in the solution, which is often called the inverted noise; see, e.g., 
Hansen |2010| for a comprehensive account on linear inverse problems. A common approach to make an 



inverse problem well-posed is via the so-called regularization methods. The goal of regularization is to 
reformulate the inverse problem aiming to obtain a unique and sufficiently stable solution. The choice of 
regularization typically depends on the continuity and degree of smoothness of the state variable of interest, 
often called the regularity condition. For instance, some state variables are very regular with high degree 
of smoothness (e.g., pressure) while others might be more irregular and suffer from frequent and different 
sorts of discontinuities (e.g., rainfall). In fact, the choice of regularization not only yields unique and stable 
solutions but also reinforces the underlying regularity of the true state in the solution. It is important to 
note that, different regularity conditions are theoretically consistent with different statistical signatures in 
the true state, a fact that may guide proper design of the regularization, as explored in this study. 

The central goal of this paper is to propose a unified framework for the class of DS, DF, and DA problems 
by recasting them as discrete linear inverse problems using relevant regularizations in the derivative space 
aiming to solve them more accurately compared to the classic formulations. Examples on rainfall DS 
and DF are presented to quantitatively elaborate on the effectiveness of the presented methodologies. 
In these examples, regularization is performed consistent with the non-Gaussian statistics of rainfall in 
the derivative space, which might be generalized to other land-surface signals such as soil moisture. For 
the DA family of problems, the promise of the presented framework, is demonstrated via an elementary 
example using the heat equation. It turns out that the accuracy of the analysis and forecast cycles can 
be improved compared to the classic DA methods, especially when the initial state is discontinuous. This 
simple example provides insight into how the new formulations can outperform the classic methods and 
be of special interest in hydrometeorological applications. 

This paper is structured as follows. Section 2 provides conceptual insight into the discrete inverse problems. 
Section 3 describes the DS problem in detail, as a primitive building block for the other studied problems. 
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Figure 1: Schematics of under- determined (a) and over- determined (b) linear discrete observation model in 
(J2|. (a) The observation matrix is fat with more columns than rows and x has a greater dimension than y. 
(b) The observation matrix is skinny with more rows than columns and size of y exceeds the dimension of 



x. 



Important classes of regularization methods are explained and their statistical interpretation is discussed 
from a Bayesian point of view. Examples on rainfall downscaling are presented in this section, taking 
into account the specific regularity and statistical distribution of the rainfall fields in the derivative space. 
Section 3 is devoted to the regularized DF class of problems with examples and results again on rainfall 
data. The regularized DA problem is discussed in Section 4 with an illustrative example using the heat 
equation. Concluding remarks and future research perspectives are discussed in Section 5. 



2. Discrete Inverse problems: Conceptual Framework 

In this section, we briefly explain the conceptual key elements of discrete linear inverse estimation relevant 
to the problems at hand and leave further details for the next sections. Analogous to equation ([!]), linear 
discrete inverse problems typically amount to estimating the true m-element state vector x E R m from 
the following linear observation model: 

y = Hx + v, (2) 

where y E R n denotes the measurement, e.g., output of a sensor, H E R nxm is an n-by-m observation 
matrix and v ~ A/"(0, R) is the Gaussian error in R n . Note that, the observation operator, which is a 
discrete representation of the kernel in 0, and the noise covariance are supposed to be known or properly 
calibrated. Depending on the relative dimension of y and x, this linear system can be under-determined 
(m ^> n) or over-determined (m <C n) (see, Figure [I]). Provided that H is full rank, in the first case, there 
are infinite different x's that satisfy ^ while for the second case a single exact solution does not exist. 
As is evident, the DS class of problems belongs to the under-determined class because the sensor output 
is a coarse-scale and noisy representation of the true state. However, the class of DF and DA problems 
fall into the category of over-determined problems, as the total size of the observations and background 
state exceeds the dimension of the true state (Figure [I]). 

In each of the above cases, we may naturally try to obtain a solution with minimum error variance by 
solving a linear least squares (LS) problem. However, for the under-determined case the solution still does 
not exist, while for the over-determined case it is commonly ill-conditioned and sensitive to the observation 
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noise (see, Section 4). Therefore, the minimum variance LS treatment can not properly make the above 
inverse problems well-posed. To obtain a unique and stable solution, beyond the LS minimization of the 
error, the basic idea of regularization is to further constrain the solution. For instance, among many 
solutions that fit the observation model in ([2]), we can obtain the one with minimum energy, mean-squared 
curvature or total variation. The choice of this constrain or regularization, highly depends on a priori 
knowledge about the underlying regularity of x. For sufficiently smooth x we naturally may promote a 
solution with minimum mean-squared curvature to impose smoothness on the solution. However, if a state 
is non-smooth and contains frequent jump discontinuities, a solution with minimum total variation might 
be a better choice. In subsequent sections, we explain these concepts in more detail for the DS, DF, and 
DA problems with hydrometeorological examples. 



3. Regularized Downscaling 
3.1. Problem Formulation 

To put the DS problem in a linear inverse estimation framework, we recognize that in the observation 
model of equaiton Q, the true state x E M m has a larger dimension than the observation vector y EK n , 
m ^> n. Note that throughout this work, a notation is adopted in which the vector x E R m may also 
represent, for example a 2D field X E Rv / ^ x v / ^ which is vectorized in a fixed order (e.g., lexicographical). 

As explained in the previous section, the DS problem naturally amounts to obtaining the best weighted 
least squares (WLS) estimate x of the high-resolution or fine-scale true state as 

x = argmin j ^ ||y - Hx|| R _i j , (3) 
where, ||x||^ = x T Ax denotes the quadratic-norm, while A is a positive definite matrix. In the above 

1 1 1 1 1 2 ^ 

notation, ^ \\y — Hx|| R _i is called the cost function whose minimum is attained at x. Due to the ill-posed 
nature of the problem, this optimization does not have a unique solution. Specifically, taking the derivative 
and setting it to zero we get 

(H T R _1 H) x = H T R V, (4) 

in which H T R _1 H is definitely singular. Indeed, many choices of x lead to the same right-hand side. 
To narrow down all possible solutions to a stable and unique one, a common choice is to regularize the 
problem by constraining the squared Euclidean norm of the solution to be less than a certain constant, 
that is ||Lx||2 < const., where L is an appropriately chosen transformation and ||x||2 = J2i x i denotes the 
Euclidean /2-norm. Note that, by putting a constraint on the Euclidean norm of the state, we not only 
narrow down the solutions but also implicitly suppress the large components of the inverted noise and 
reduce their spoiling effect on the solution. 
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Using the theory of Lagrangian multipliers the dual form of the constrained version of the optimization in 
© is 

x = argmin|i||y-Hx||^_i + A HLx^j , (5) 
where A > is the Lagrangian multiplier or the so-called regularizer. 

The first term in ^ measures how well the solution approximates the given (noisy) data, while the second 
term imposes a specific regularity on the solution. In effect, the regularizer plays a trade off role between 
making the fidelity to the observations sufficiently large, while not imposing too much regularity (in this 
case, smoothness) on the solution. The smaller the value of A, the more weight is given to fitting the 
(noisy) observations resulting in solutions that are less regular and prone to overfitting. On the other 
hand, the larger the value of A the more weight is given to the regularity of the solution. The goal is to 
find a regularized solution, by finding a good balance between the two terms such that the it is sufficiently 
close to the observations while obeying the underlying property of the true state. As is evident, the L- 
transformation of the state also plays a key role in the solution of equation For instance, choosing 
an identity matrix implies that we are looking for a solution with a small Euclidean norm (energy), the 
so called least-norm solution, while if L represents an approximation of the second order derivative (i.e., 
Laplacian transform), the 1 1 1 1 ^ is a measure of the mean-square curvature of the state which imposes 
extra smoothness on the solution. 

The problem in equation ^ is a smooth convex quadratic programming problem since the cost function 
is differentiable and its Hessian H T R _1 H + 2AL T L is always positive definite for any A > 0, provided 
that L T L is positive definite. This problem is known as the Tikhonov regularization with the following 
analytical solution 

x = (H T R _1 H + 2 A L T L) _1 H T R _1 y, (6) 



| Tikhonov et a/. , 19771. It is easy to show that the covariance of the estimate is the inverse of the Hessian 



of the cost function in (|5) which is (H T R _1 H + 2A L T L) 1 . Notice that, for large scale regularization 
problems, this closed form solution can not be directly computed and efficient iterative methods (e.g., 
Newton's method) need to be employed. 

Depending on the selected transformation and the intrinsic regularity (degree of smoothness) of the un- 
derlying state, other choices of the regularization term are also common. For example, in case the L- 
transformation projects the major body of the state vector onto (near) zero values, a common choice is 
the /i-norm instead of the /2-norm in ^ [e.g., Chen et a/., 1998| . Such a property is often referred to 
as sparse representation in the L-transformation space and gives rise to the following formulation of the 
regularized DS problem: 

x = argmin |^ ||y - Hx||^_i + A HLx^j , (7) 

where, the /i-norm is Hx^ = ^ \x{\. Note that, the problem in Q is a non-smooth convex optimization 
as the regularization term is non-differentiable and the conventional iterative gradient descent methods 
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are no longer applicable in their standard forms. Several optimization techniques are currently under 
development to directly solve the non-smooth inverse problem in Q that may include: (1) The iterative 



shrinkage thresholding algorithms pioneered by Tibshirani [1996 1; (2) The basis pursuit by \ Chen et al. 
1 1998 1; (3) Constrained quadratic programing [e.g., Figueiredo et al } 2007 1; (4) Proximal gradient based 
methods [e.g., Beck and Teboulle, 2009 1; and (5) Interior point methods [e.g., Kim et al., 2007] . 

One of the common approaches to treat the non-differentiability in Q is to replace the Zi-norm with a 
smooth approximation, the so called Huber norm, ||x|| Hub = J2i PT(%i), where 



p T {x) 



x 



\x\ < T 



T(2\x\ -T) \x\ > T 



(8) 



and T denotes a non-negative threshold (Figure [2]). The Huber norm is a hybrid one that behaves similar 
to the Zi-norm for values greater than the threshold T while for smaller values it is identical to the I2- 
norm. From the statistical regression point of view, the sensitivity of a norm as a penalty function to the 
outliers depends on the (relative) values of the norm for large residuals. If we restrict ourselves to convex 
norms, the least sensitive ones to the large residuals or say the outliers are those with linear behavior for 
large input arguments (i.e., l\ and Huber). Because of this property, these Zi-like norms are often called 
robust norms, \Huber\ |1964[ |1981[ \Boyd and Vandenberghe\ |2004| . Throughout this paper, for solving 
/i-regularized inverse problems, we use the Huber relaxation due to its simplicity, efficiency and adaptivity 
to all of the concerning classes of DS, DF, and DA problems. 



2.5 




X 



Figure 2: The Zi, I2 and Huber norms. The Huber penalty is a smooth relaxation of the Zi-norm which 
acts quadratically for input values smaller than the threshold while it behaves linearly for larger inputs. For 
inputs with heavy tail behavior, linear penalization in the regularization term is advantageous compared to 
the quadratic penalty functions in which the cost function becomes dominated by a few large values in the 
tail of the distribution. 



In downscaling of state variables containing frequent jumps and isolated singularities (e.g., local rainfall 
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extremes) using regularization in the derivative space (i.e., Lx is an approximate derivative measure), the 
Zi-like norms typically preserve rapid variations and lead to improved and sharper solutions, compared 
to the overly smooth results of the /2-regularization; see, examples presented by Ebtehaj et al |2012| for 
rainfall downscaling and other examples in the next section. 

3.2. Statistical Interpretation 

From the frequentist statistical point of view, it is easy to show that the WLS solution of ^ is equivalent 
to the maximum likelihood estimator (ML) 

xml = argmaxp (y |x) , (9) 

X 

given that the likelihood density is Gaussian, p(y|x) oc exp (-i/2(y — Hx) T R _1 (y — Hx)). Specifically, 
taking — log(-), one can find the minimizer of the negative log-likelihood function — log{p(y|x)} as, 



xml = argmin j^(y-Hx) T R 1 (y-Hx)| 



= argmin ji ||y - Hx||^_i j , (10) 
which is identical to the WLS solution of ([3]). 

It is important to note that in the ML estimator, x is considered to be a deterministic variable (fixed) 
while y has a random nature. 

On the other hand, in the Bayesian perspective, the regularized solution of equation ^ is equivalent to 
the maximum a posteriori (MAP) estimator 

xmap = argmaxp (x|y) , (11) 

X 

where both x and y are considered of random nature. Specifically, using Bayes theorem, ignoring the 
constant terms in x and applying — log(-) on the posterior density p(x|y), we get 

xmap argmin < - log 



p(y) 

= argmin {— log p (y|x) — log p(x)} . (12) 

X 

The first term, — log p(y|x), is just the negative log-likelihood as appeared in the ML estimator and the 
second term is called the prior which accounts for the a priori knowledge about the density of the state 
vector x. Accordingly, the proposed Tikhonov regularization in ^ is equivalent to the MAP estimator 
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assuming that the state, or the linear transformed state Lx, is a multivariate Gaussian of the form 



log p(x) oc x T Qx, 



(13) 



where the covariance is Q = L T L [e.g., 



Tikhonov et a/., 


1977; 


Elad and Feuer, 


1997; 


Levy , 



2008 



Clearly, the choice of the Zi-norm in equation Q, implies that log p(x) oc HLx^ or say the transformed 
state can be well explained by a multivariate Laplace density with heavier tail than the Gaussian case [e.g., 



Tibshirani , |1996 ; \Lewicki and Sejnowski , 2000 1. Similarly, selecting the Huber norm can also be interpreted 



as log p(x) oc J2i Pr(%i), which is equivalent to assuming the Gibbs density function as the prior probability 
model of the state \Geman and Geman } 1984; Schultz and Stevenson, 1994| . The equivalence between the 
regularized estimation approach, which imposes constraints on the regularity of the solution, and its 
Bayesian counterpart, which imposes constraints on the prior probability of the state is very insightful. 
This relationship establishes an important duality which can guide the selection of regularization depending 
on the statistical properties of the state in the real or derivative space. 



3.3. Examples on Rainfall DS 

3.3.1. Problem Formulation and Settings 

To downscale a remotely sensed hydrometeorological signal using the explained discrete regularization 
methods, we need to have proper mathematical models for the downgrading operator and also a priori 
knowledge about the form of the regularization term. Clearly, the downgrading operator in the presented 
framework needs to be a linear approximation of the sampling property of the sensor. If a sensor directly 
measures the state of interest, while its maximum frequency channel is smaller than the maximum fre- 
quency content of the state (e.g., precipitation), the result of the sensing would be a smoothed and possibly 
downsampled version of the true state. Thus, each element of the observed state in a grid-scale might 
be considered as an average (low-resolution) representation of the true state, lacking the high-resolution 
subgrid-scale variability. To have a simple and tractable mathematical model, the downgrading matrix 
might be considered translation invariant and decomposed into H = DC, where C encodes the smoothing 
effect and D contains information about the sampling rate of the sensor. As a simple mathematical model, 
let us suppose that each grid point in the low-resolution observation is a (weighted) average of a finite 
size neighborhood of the true state, positioned at the center of the grid. In this case, the sensor smooth- 
ing property in C can be encoded by the filtering and convolution operations, while D acts as a linear 
downsampling operator (Figure [3]). These matrices can be formed explicitly, while direct matrix- vector 
multiplication (e.g., Cx and C T x, x E R m ), requires a computational cost in the order of 0(m 2 ). However, 
for large scale problems, those matrix- vector multiplications (i.e., convolution operation) are typically per- 
formed more efficiently, for instance in the Fourier domain with computational cost of 0{m\ogm). Figure 
[3] sketches the above matrix- vector multiplication via the filtering and convolution operations. 
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d) Upsampling and Downsampling 
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Figure 3: Two dimensional mathematical models of the smoothing and downsampling property of a low- 
resolution sensor via the convolution operation, (a) A simple representation of the used observation model 
for a neighborhood of size 3-by-3 using a simple uniform averaging filter, (b-c) A sample effect of the filtering 
operation (C) and its transpose (C T ) on a discrete 2D unit pulse, given the shown 3x3 kernel, (d) A sample 
effect of the 2D downsampling operator (D) and its transpose (D T ) with scaling ratio 2. In the filtering 
operation, we basically slide the kernel over the field and sum the element-wise multiplication of the kernel 
elements with those of the field and then put the results at the center of the current position of the kernel. 
However, in the convolution operation we first rotate the kernel by 180° and follow the same procedure. 



As is evident, the smoothing kernel needs to be estimated for each sensor, possibly by learning from a 
library of coincidental high and low-resolution observations (e.g., Ebtehaj et a/., 2012| ), or through a direct 
minimization of an associated cost. In the absence of prior knowledge, one possible choice is to assume 
that the sensor observes a coarse grained (i.e., non-overlapping box averaging) and noisy version of the 
true state. In other words, to produce a field at the grid-scale of s c x s c from a 1 x 1, this assumption is 
equivalent to selecting a uniform smoothing kernel of size s c x s c followed by a downsampling ratio of s c 
(Figure [ife 
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Figure 4: (a) A uniform smoothing (low-pass) kernel of size s c x s c . (b) The discrete (high-pass) generalized 
Laplacian filter of size 3x3, where k is a parameter ranging between to 1. The Laplacian coefficients, 
obtained by filtering the 2D state with the Laplacian kernel, are approximate measures of the second order 
derivative. Throughout this paper, we choose k = 0.5 which corresponds to the 2D standard second order 
differencing operation. 



The choice of the regularization term also plays a very important role on the accuracy of the DS solution. 
Figure^ demonstrates a NEXRAD reflectivity snapshot (lxl km) over the Texas TRMM ground valida- 
tion (GV) site, while Figure^ displays the standardized histogram of the discrete Laplacian coefficients 
(second order differences) and the fitted exponential of the form p(x) oc exp(— A \x\). It is seen that the 
analyzed rainfall image exhibits a (near) sparse representation in the derivative space with a large mass 
at zero and heavier tail than the Gaussian. Although not shown here, the universality of this structure 
can be observed in other rainfall reflectivity fields, denoting that the choice of the /i-like regularization is 
preferred in the rainfall DS problems rather than the choice of the Tikhonov regularization; see, \Ebtehaj 



and Foufoula-Georgiou |2011a| for a thorough survey of rainfall statistics in derivative space, in terms of 



the wavelet coefficients. 

This well behaved non-Gaussian structure in the derivative space mainly arises due to the presence of 
spatial coherence (correlation) in the rainfall fields and abrupt occurrences of piece-wise discontinuities 
(large gradients). In effect, over the large areas of uniform rainfall intensity, a measure of derivative 
translates rainfall values into a large number of (near) zero values; however, over the less frequent jumps 
and isolated high-intensity rain-cells, values of the derivative measure are markedly larger than zero and 
form the tails. Note that this non-Gaussianity is due to the intrinsic spatial structure of rainfall reflectivity 
fields and can not be resolved by a logarithmic or power-law transformation (e.g., Z-R relationship). It 
is easy to see that after applying the Z-R relationship on the reflectivity fields, the shape of the rainfall 
histogram remains non-Gaussian and still can be explained by the Laplace density (not shown here). In 
practice, the histogram of the derivatives may exhibit a thicker tail than the Laplace density, requiring 
a heavier tail model such as the Generalized Gaussian Density (GGD) of the form p(x) oc exp (—A \x\ p ), 



where p < 1 [see, Ebtehaj and Foufoula-Georgiou , 2011a|. However, using such a prior model gives rise to 



a non-convex optimization problem in which convergence to the global minimum can not be guaranteed. 
Hence, the choice of the Zi-norm (the Laplace prior) is indeed the closest convex regularization that can 
partially fulfill the strict statistical interpretation, while the actual prior might still be better explained 
by a heavier tail model than the Laplace density. 

Following our observations related to the distribution of the rainfall derivatives, here we direct our attention 
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Figure 5: A rainfall reflectivity field and the distribution of its standardized Laplacian coefficients with unit 
standard deviation, Lx n = Lx /std(Lx), where std(-) is the standard deviation operator and V 2 x = Lx. (a) 
NEXRAD reflectivity snapshot at the TRMM GV-site in Houston, TX (HSTN) on 1998/11/13 (00:02:00 
UTC) at scale lxl km. (b) The histogram of the standardized Laplacian coefficients , with n = 0.5 (Figure 
[4|, and (c) the log-histogram. Note that, the zero coefficients over the non-rainy background have been 
excluded from the histogram analysis. The dash line in (b) is the least squares fitted exponential of the 
form p(x) ex exp(— A \x\) and the dash-dot line shows a standard normal distribution for comparison. The 
log-histogram in (c) contrasts the heavy tailed structure of the coefficients versus the Gaussian distribution 
clearer than the original histogram in (b). 



to the Huber penalty function as a smooth approximation of the /i-regularization, 



Hxl 



R 



-i + A IlLxl 



Hub ' 



(14) 



Minimization of the above cost function can be easily achieved using first order efficient gradient descent 
based methods. However, as the rainfall fields are non-negative, we used the gradient projection (GP) 
method \Bertsekas , 1999, pp. 228], to solve the above problem in a constrained mode such that x y (see 
Appendix A). 



3.3.2. Results 

The same rainfall snapshot shown in Figure [5] has been used to examine the performance of the proposed 
regularized DS methodologies. Throughout the paper, to make the reported parameters independent of 
the range of intensity values, the rainfall reflectivity fields are first scaled into the range between and 1; 
however, the results and Figures are presented in the true range. 

To demonstrate the performance of the proposed regularized DS methodology, the NEXRAD high-resolution 
observation x was assumed as the true state while the low-resolution observations y were obtained by 
smoothing x with an average filter of size s c x s c followed by a downsampling ratio s c . Given the true 
state and constructed observations, we can quantitatively examine the effectiveness of the presented DS 
methods. In fact, selecting some common quality metrics, we demonstrate the effective improvement of 
those measures using the presented regularized DS framework. 
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Both the Huber and Tikhonov regularizations were examined to downscale the observations from scales 
4x4 and 8x8 km down to 1 x 1 km (Figure [6]). A very small amount of white noise (i.e., standard 
deviation of le-3) was added to the low-resolution observations (equation [2]) , giving rise to a diagonal 
error covariance matrix. In both of the regularization methods, the regularization parameter A was set 
to 5e-3 and le-2 for downscaling from 4-to-l and 8-to-l km in grid spacing, respectively. These values 
are selected through trial and error; however, there are some formal methods for automatic estimation of 
this parameter which are left for future work [e.g., the L-curve; see Hansen, 2010| . In our experiments, 
it turned out that small values of the Huber threshold T, typically less than 10% of the field maximum 
range of variability, lead to a successful recovery of isolated singularities and local extreme rainfall cells 
(Figure [7]). 

In the studied snapshot, coarse graining of the rainfall reflectivity fields to the scales of 4 x 4 and 8x8 
kilometers was equivalent to loosing almost 20 and 30 percent of the rainfall energy in terms of the relative 
Root Mean Squared Error (RMSE), RMSE r = l|x-x|| 2 /|| x || 2 (see, Table[l]). Note that, to compute the RMSE 
of the low-resolution observations, the size of those fields was extended to the size of the true field using the 
nearest neighborhood interpolation, that is, each low-resolution pixel was replaced with s c x s c pixels with 
the same intensity value. In addition to the relative RMSE measure, we also used three other metrics: (1) 
Relative Mean Absolute Error (MAE), MAE r = 11^— ^11 i/||x|| x ; (2) A logarithmic measure often called the 
peak signal-to-noise ratio (PSNR), PSNR = 201og 10 ( max (*)/std(x-x)) where std(-) denotes the standard 
deviation and; (3) The Structural Similarity Index (SSIM) by \Wang etld] |2004| . The PSNR in decibel 
(dB), represents a measure that not only contains RMSE information but also encodes the recovered range. 
The latter measure varies between -1 and 1 and the upper bound refers to the case where the estimated and 
reference fields are perfectly matched. The SSIM metric is popular in the image processing community as 
it takes into account not only the marginal statistics such as the RMSE but also the correlation structure 
between the estimated and reference (true) image. This metric seems very promising for analyzing the 
forecast mismatch with observations in hydro-meteorological studies, especially when the systematic errors 
in the large scale features of the predicted state (e.g., displacement error) might be more dominant than 
the random errors; see Ebtehaj et al. [2012] for applications of SSIM in rainfall downscaling. 

On average, it was seen that one third of the lost relative energy of the rainfall reflectivity fields can be 
restored via the regularized DS (Table [I]). The /2-regularization led to smoother results and as the scaling 
ratio grows, this regularization was almost incapable to recover the peaks and the correct variability range 
of the rainfall field (Figure [7]). Typically, as expected, the Huber regularization results were slightly better 
than the Tikhonov ones. For large scaling ratios (i.e., > 4 x 4 km) the results of those methods tended 
to coincide in terms of the global quality metrics such the RMSE. However, using the Huber prior, the 
recovered range was markedly better than that by the Tikhonov regularization as reflected in the PSNR 
metric. 
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Figure 6: Sample results of the rainfall regularized Downscaling (DS). (a) True high-resolution rainfall 
reflectivity: NEXRAD snapshot at the TRMM GV-site in Houston, TX (HSTN) on 1998/11/13 (00:02:00 
UTC) at resolution lxl km. (b-c) The synthetically generated, 4x4 and 8x8 km, coarse-scale and noisy 
observations of the true rainfall reflectivity field. Left column: Tikhonov (d) and Huber (f) regularization 
results for downscaling from 4-to-l km (T = 0.02). Right column: Tikhonov (e) and Huber (g) regularized 
DS for downscaling from 8-to-l km (T = 0.04). Zooming views of the delineated box in (g) are shown in 
Figure [7| 



Metric 1 ' 


Observations 


Tikhonov 


Huber 




4 x 4 km 


8 x 8 km 


4 x 4 km 


8 x 8 km 


4 x 4 km 


8 x 8 km 


RMSE r 


0.19 


0.29 


0.15 


0.20 


0.14 


0.19 


MAE r 


0.15 


0.25 


0.13 


0.18 


0.11 


0.17 


SSIM 


0.71 


0.56 


0.78 


0.66 


0.80 


0.66 


PSNR 


23.8 


19.6 


26.5 


23.1 


27.0 


24.0 



Table 1: Results showing the effectiveness of the proposed regularized DS by reducing the estimation error 
and increasing the accuracy of the rainfall fields. The first two columns refer to the values of the quality 
metrics obtained by comparing the constructed low-resolution observations with true lxl km reflectivity 
field. The performance of the Huber prior is slightly better than the /2-regularization, especially for the small 
scaling ratios (i.e., < 4 x 4 km).''" RMSE r : relative root mean squared error; MAE r : relative maximum 
absolute error; SSIM: structural similarity; and PSNR: peak signal to noise ratio. 



4. Regularized Data Fusion 
4.1. Problem Formulation 

Analogous to the DS problem in the previous section, here we focus on the formulation of the DF problem. 
In the DF class of problems, typically, an improved estimate of the true state is sought from a series of low- 
resolution and noisy observations. Let x E K m be the true state of interest while a set of N downgraded 
measurements y l E R n S i = 1, . . . , N, are available through the following linear observation model 

y* = HSc + v, (15) 

where rii « m, ff G j^xra an( j v ^ ^ j\f (q, denotes uncorrelated Gaussian error in R n % E w l (v J ) T = 
0. Compared to the DS family of problems, DF is more constrained in the sense that usually there are 
more equations than the number of unknowns, T^rii ^> m, giving rise to an overdetermined linear system. 
As previously explained, naturally the linear weighted least squares (WLS) estimate of the true state, 
given the series of N observations, amounts to solving the following optimization problem: 

'1 N , N 

1 x — > /n n2 



x = argmin j- ^ (J|y* - ITxH^-x j | . (16) 

Notice that the solution of the above problem not only contains information about all of the available 
observations (Fusion) but also, with proper design of the observation operators, allows us to obtain an 
estimate with higher resolution than any of the available observations (Downscaling). Clearly, the inverse 



of each covariance matrix in (16) plays the role of the relative contribution or weight of each y l in the 
overall cost. In other words, if the elements of covariance matrix of a particular observation vector are 
large compared to those of the other observation vectors, naturally, the contribution of that observation 
to the obtained solution would be less significant. 
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Figure 7 : A zooming view for comparing qualitatively the Tikhonov (a, c) versus Huber (b, d) regularization, 
for the Downscaling (DS) example over the delineated box in Figure |6^. The results indicate a better 
performance of the Huber regularization, especially for smaller scaling ratios. The Huber regularization 
yields sharper results and is more capable to recover high-intensity rainfall cells and the correct range of 
variability; see Table [T] for quantitative comparison using a suit of metrics. 



For notational convenience, the above system of equations can be augmented as follows: 



V" 




H 1 




V" 








X + 















Hx + v 



(17) 
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where, the concatenated error vector v has the following block diagonal covariance matrix, 

"R 1 



R = E [; 



vv 







R 



N 



(18) 



Therefore, the DF problem can be recast as the classic problem of estimating the true state from the 
augmented observation model of y = Hx + v. Setting the gradient of the cost function in equation (16) 
to zero, yields the following linear system: 



(H T R _1 H) x = H T R V. 



(19) 



In this case, the left hand side H T R 1 H is likely to be very ill-conditioned giving rise to an unstable LS 
solution, highly sensitive to any perturbation in the observation vector in the right-hand side (Figure [8b) 



[see, e.g., Elad and Feuer } 1997; Hansen, 2010| . 

One possible remedy for stabilizing the solution is regularization. Recalling the formulation discussed in 
the previous section, a general regularized form of the DF problem can be written as 



x = argmm 



1 



HxH^+A^l (x) 



(20) 



where the convex function ( x ) can take different penalty norms such as: the smooth Tikhonov 1 1 Hi x 1 1 ^ ; 
the non-smooth /i-norm HLx^; and the smooth Huber-norm ||Lx|| Hub . 

For the case of the linear Tikhonov regularization, computation of the solution requires to invert the 
Hessian (H T R _1 H + 2AL T L) of the objective function similar to equation Q. Analogous to the DS 
problem, the covariance of the posterior distribution is (H T R _1 H + 2AL T L) 

Based on the selected type of regularization, statistical interpretation of the DF regularized class of prob- 
lems is also similar to what was explained in Section 3.2. In other words, given the augmented classic 



observation model in (17), it is easy to see that the solution of (16) is the ML estimator while (20) can be 



interpreted as the MAP estimator with a prior density depending on the form of the regularization term. 



4.2. Example on Rainfall DF 

To quantitatively analyze the effectiveness of the regularized DF for rainfall data, we reconstructed two 
synthetic low-resolution and noisy observations from the original high-resolution NEXRAD reflectivity 
snapshot. To resemble different sensor constraints we chose different smoothing and downsampling oper- 
ations for each of the reconstructed field. The first observation field yi was produced, at resolution 6x6 
km, using a simple averaging filter of size 6x6 followed by a downsampling ratio of s c = 6. Analogously, 
the second field y2 was generated at scale 12x12 km using a Gaussian smoothing kernel of size the 12x12 
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Figure 8: Data Fusion and Downscaling of multi- sensor remotely sensed rainfall reflectivity fields using the 
Huber regularization. (a-b) Reconstructed low-resolution and noisy rainfall observations at scale 6 and 12 
km in grid spacing, (c) The results of the WLS solution in (16), and (d) the solution of Huber regularized 
DF with A le-3 and T = le-2. 



with a standard deviation of 4. A white Gaussian noise, with standard deviation of le-2 and 2e-2 was also 
added, respectively, to resemble the measurement random error. Roughly speaking, this selection of the 
error magnitudes implies that the degree of confidence (relative weight) on the observation at 6 x 6 km is 
twice as large as for the one at 12 x 12 km scale. According to the selected smoothing and downsampling 
operations, the downgrading operators H 1 and H 2 are designed to produce a high-resolution field at the 
scale of 1 x 1 km. To solve the DF problem, we have used the same settings for the Gradient Projection 
(GP) method as explained in Appendix A. 



The solution of the ill-conditioned WLS formulation or the ML estimator in (16) is blocky, out of range 



and severely affected by the amplified inverted noise (Figure [8p). On the other hand, the Huber regu- 
larization can properly restore a fine-scale and coherent estimate of the rainfall field. The results show 
that almost 30% of the uncaptured subgrid energy of the rainfall field can be restored through solving the 
regularized DF problem (Table [2]). Improvements of the selected fidelity measures in the DF problem is 
more pronounced than the results of the DS experiment. This naturally arises, because more observations 
are available in the DF problem than the DS one and thus, the results are likely to be more accurate. 
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Metric 


Observations 


DF results 




6x6 km 


12 x 12 km 


lxl km 


RMSE r 


0.25 


0.35 


0.17 


MAE r 


0.21 


0.32 


0.15 


SSIM 


0.60 


0.50 


0.72 


PSNR 


21.3 


18.1 


25.0 



Table 2: Values of the selected fidelity metrics in the rainfall DF experiment using the Huber regularization, 
see the text for the definitions. Here, metrics refer to comparison of the low-resolution (6x6 and 12 x 12 
km) observations the DF results with the true field (lxl km). 



5. Regularized Variational Data Assimilation 



5.1. Problem Formulation 



Compared to the previously explained problems of downscaling and data fusion, the data assimilation (DA) 
problem is more involved in the sense that we need to address the evolution of a dynamical system and the 
available (low-resolution) observations at the same time in the estimation process. Despite the increased 
complexity, DA shares the same principles with the explained formulations of the DS and DF problems, 
from the estimation point of view. Here, we briefly explain the linear 3D and 4D-VAR data assimilation 
schemes and extend their formulations to a regularized format. Sample results of the regularized variational 
data assimilation problem are illustrated on the estimation of the initial conditions of the linear heat 
equation in a 3D-VAR setting. 

The 3D-VAR is a memoryless assimilation method. In other words, at each time step, the best estimate of 
the true state or say analysis is obtained based only on the present-time noisy observations and background 
information of the dynamical system. The analysis is then being used for forecasting the state at the next 
time step an so on. Suppose that the true state of interest at discrete time is denoted by x& E M m , a 
single noisy observation is y& E K n , and E M m represents the background state. In the linear 3D-VAR 
data assimilation problem, obtaining the analysis state E R m amounts to finding the minimum point 
of the following cost function: 



1 




2 1 , 






B- + 2-I 


2 





\y k - Hxfc|| R _i . 



(21) 



In equation (21), B E R mxm is the background error covariance matrix, H denotes the observation 
operator, while the analysis is the optimal solution: x^ = argmin {J73d(x&)}. Here we assume that the 

x k 

observation matrix is translation invariant. Clearly, this 3D-VAR problem is the WLS problem which has 
the following analytic solution 



L x 6 , + H T R- 1 



Yk 



(22) 
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Because the error covariance matrices are positive definite, the matrix B _1 +H T R _1 H is always positive 
definite and hence invertible. Thus, solution of the 3D-VAR requires no rank or dimension assumption on 
H. However, this problem might be very ill-conditioned depending on the architecture of the covariance 
matrices and the measurement operator. 

The classic 4D-VAR is, indeed, an extension to the explained 3D-VAR which exploits the temporal memory 
of the system by constraining the solution (analysis) to the underlying dynamics. Assuming that we are 
interested in estimating the initial condition of a dynamical model at previous discrete time to be used 
for an improved forecast in future time. The 4D-VAR assimilation method is formulated in such a way 
that uses the initial background state x| and also the observations within a finite discrete time interval 
[£&,..., £&+t]- Accordingly, this assimilation method amounts to obtaining the minimum point of the 
following cost function: 



Jad ( x fc) 



B-. + 2 



IE 



Hx, 



|2 

Ir.7 1 



(23) 



while constraining the solution to the underlying dynamics by assuming that = M^x&, where U>tk- 
The model operator £ is a discrete linear representation of the system dynamics that evolves the state 



Courtier and Talagrand 


, 1990; 


Daley , 


1993 


; Zupanski, 


1993; Kalnay, 



that, in the above formulation it is implicitly assumed that the background and all measurement errors 
are mutually uncorrelated. Clearly, the linear 4D-VAR optimization contains a background cost which 
measures the weighted Euclidean distance of the true initial condition to the background state at the 
beginning of the interval and an accumulated cost, associated with all of the noisy observations within the 
selected time interval. Taking into account the imposed constraint by the model operator, the 4D-VAR 
cost function can be recast as, 



J^D^k) 



2 1^ 

i=k 



HM 



k, i x fc llj^ - 1 



(24) 



where its optimal solution, x^ = argmin { JAD^k)}-, is the analysis state at k time-step. Thus, it is easy 

to see that, the linear 3D and 4D-VAR problems are in the category of the classic WLS problems which 
might be very ill-conditioned. 

Analogous to the previous discussions, the generic regularized form of the linear 3D-VAR under the 
predetermined L-transformation might be considered as follows: 



*k = argmin {J 3D (x fc ) + A^ L (x*.)} , 

Xfc 



(25) 



where ( x ) can take any of the explained regularization penalty functions including: the smooth 



Tikhonov ||Lx|L; the non-smooth Zi-norm 



iLxHp and the smooth Huber-norm 



|Lx| 



Hub' 



It is easy 
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to realize that the regularized 4D-VAR estimation of the the initial condition can also take the following 
form: 

x£ = argmin {J 4D (x fc ) + A tp L (*k) } • (26) 

In the above regularized formulations, the solution not only becomes close to the background and ob- 
servations, in the weighted Euclidean sense, but it is also enforced to follow a regularity imposed by the 
(x). Here, we emphasize that the regularization typically yields a stable and improved solution with 
less uncertainty; however, this gain comes at the price of introducing bias in the solution whose magnitude 



can be kept small, by proper selection of the regularizer \Hansen, 20101. 



5.2. Statistical Interpretation 



Statistical interpretation of the classic variational DA problems is a bit tricky compared to the DS and DF 
class of problems, mainly because of the involvement of the background information in the cost function. 
Lorenc |1986| derived the 3D-VAR cost function using Bayes theorem and called it the ML estimator [see, 
e.g., Lorenc, 1988; Bouttier and Courtier, 20021. More recently, it has been argued that the 4D-VAR, 



and thus as a special case the 3D-VAR cost function, can be interpreted via the Bayesian MAP estimator 



Johnson et a/., 2005; Freitag et al. t 2010; Nichols , 20101. For notational convenience, here we only explain 



the statistical interpretation of the 3D-VAR and its regularized version which can be easily generalized for 
the case of the 4D-VAR problem. 

As discussed earlier, the ML estimator is basically a frequentist view to estimate the most likely value of 
an unknown deterministic variable from an (indirect) observation with random nature. The ML estimator 
intuitively requires to find the state that maximizes the likelihood function as xml — argmaxp (y|x). Let 

X 

us assume that, at time step the background is just a (random) realization of the true deterministic 
state x&. In other words, we consider x^ = x& + w, where the error w can be well explained by a zero mean 
Gaussian density AA(0, B), uncorrelated with the observation error, E [wv T ] = 0. Here, the background 
state is treated similar to an observation with random nature. Thus, let us recast the problem of obtaining 
the analysis as a classic linear inverse problem by augmenting the available information in the from of 



y Hx/e + v, where y 
diagonal covariance matrix 



(4) 



T 



H = [I, H T ] , and v ~ jV(0, R) with the following block 



R 



B 
R 



(27) 



Notice that R is block diagonal because the background and observation errors are uncorrelated. Following 
the augmented representation and applying — log(-), we have — log p(y|x&) oc V 2 (y ~~ H x /c) T fL _1 (y — 
Hx fc ) and thus it is easy to see that the ML estimator in terms of the augmented observations, x^ = 
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argmaxp (y|x&), is equivalent to minimizing the 3D-VAR cost function in (21). Therefore, following this 



x k 



statistical interpretation, the classic 3D-VAR, can be derived via the frequentist ML estimator. 

On the other hand, from the Bayesian perspective, the state of interest and the available observations 
are considered to be random and the MAP estimator is the optimal point which maximizes the posterior 
density as xmap — argmaxp (x|y). Let us assume a priori that the (random) state of interest has a 

X 

Gaussian density with the mean X5 and covariance B, that is p(~Xk) ~ -^( x t' B). More formally, this 
assumption implies that the deterministic background is the central (mean) forecast and is related to the 
random true state via x& = x| + w, where w ~ A/"(0, B). Therefore, using Bayes theorem; see equation 
(12), it immediately follows that the 3D-VAR is the MAP estimator with the assumed Gaussian prior for 
the true state, x^ = argmax^ (x&|y). 

Xk 



In conclusion, the regularized 3D-VAR in (25) might be interpreted as the MAP estimator, x^ = argmaxp (x&|y) , 

x k ~~ 

with the prior density, p(~Xk) oc A^l (x/c), when we follow the frequentist approach and use the augmented 
notation to interpret the classic 3D-VAR as the ML estimator. On the other hand, taking the MAP 
interpretation for the classic 3D-VAR, the regularized version might be understood as the MAP estima- 
tor which also accounts for an extra and independent prior on the distribution of the state under the L 
transformation. 



5.3. Heat Equation Example 

The promise of the proposed regularized 3D-VAR data assimilation methodology, is shown via assimilating 
noisy observations into the dynamics of heat equation with top-hat initial condition. Specifically, we 
constructed noisy background and noisy-low-resolution observations of the top-hat initial condition and 
then demonstrated the effectiveness of a proper regularization on the quality of the obtained analysis 
and forecast states. In the assimilation cycle, we obtained the analyses using the classic and regularized 
3D-VAR assimilation methods and then, we examined those analysis states to obtain the forecast state 
at the next time step. Then the computed analysis and forecast states are compared with their available 
ground-truth counterparts. Although the heat equation has a diffusive nature and is not sensitive to its 
initial condition, the provided examples are very illustrative about the role of regularization on the quality 
of solutions. 

For a space-time representation of a ID scalar quantity x (s, £), the well-known heat equation is 

= 7V 2 *(M) (28) 
x{s, 0) = xq(s), 

where 00 < s < 00, < £ < 00. For mathematical treatment, let us assume that 7 = 1 [ l2 /t]. It is well 
understood that the general solution of the heat equation at time t is then given by the convolution of the 
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initial condition with the fundamental solution (kernel) as 



x(.,«)=/ K (. 



r, i) xo(r) c?r, 



where 



t) = (47rt)- m / 2 exp 



At 



(29) 



(30) 



We can see that x (s, t) is obtained via convolution of the initial condition by a Gaussian kernel with the 
standard deviation of a = y/2t. Clearly, estimation of the initial condition xq(s) only from the diffused 
and possibly noisy observations x(s, t) is an ill-posed deconvolution problem (see equation [I]) . 

To reconstruct a 3D-VAR assimilation experiment, we assume that the true initial condition in discrete 
space is a vector with 256-elements (x G M m where m — 256) as follows: 



x 



112 < Xi < 144 
otherwise, 



(31) 



the so-called top-hat initial condition. We added a white Gaussian noise with a w = 0.05 to the true initial 
condition for reconstructing the background state Xq for the assimilation experiment. 

We assumed that the observation vector is a downgraded version of the true state, while the sensor can 
only capture the mean of every four neighbor elements of the true state. In other words, the observation 
is a noisy and low-resolution version of the true state with one quarter of its size (Figure [9]). To this end, 
using the linear model in Q, we employed the following architecture for the observation operator: 



H 



1 11 1 0000 ••• 0000 
0000 1 1 11 ••• 0000 

0000 0000 ••• 1111 



(32) 



and an n-element white Gaussian observation error with a v — 0.03, where n = 64. 

The top-hat initial condition is selected to emphasize the role of regularization, especially those with 
linear penalization (i.e., the Huber or /i-norm ). Clearly, the first order derivative of the above initial 
condition is very sparse. In other words, the first order derivative is zero everywhere on its domain 
except at the location of the two jumps, resembling a heavy-tailed and sparse statistical distribution. This 
underlying structure prompts us to use the /i-like regularization and first order differencing operator for 
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Figure 9: (a) The true initial condition xq and the results of the heat equation at t = 5 and 100 [T], obtained 
from convolution of the initial condition with the fundamental Gaussian kernel, (b) The reconstructed 
background state by adding a white noise with <j w = 0.05. (c) The low-resolution and noisy observation with 
a v = 0.03. (d) The results of the classic 3D-VAR and the regularized version using the Tikhonov (T3D-VAR) 
and the Huber (H3D-VAR) regularizations (see equation [25]) . (e-f) Magnified parts of of the graphs in (d) 
over the shown zooming windows. 



the L-transform in (25), as follows: 



-1 




1 
-1 1 










-1 1 



(m-l)xm 



(33) 



Note that, instead of incorporating a measure of curvature in the regularization term to impose smoothness 
on the solution, here we seek a solution with minimize total variation. 

Figure [9] shows the inputs of the assimilation experiment (left panels) and the results of the analysis 
cycle (right panels) using the classic versus the regularized 3D-VAR estimators. In this example, it is 
clear that the classic solution overfits, while slightly damps the noise. Indeed, the 3D-VAR is unable 
to effectively damp the high-frequency error components and impose the underlying regularity of the 
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true state, because, its cost function is only about minimizing the weighted squared error. On the other 
hand, in the regularized assimilation methods not only the error term, but also a cost associated with 
the state regularity is also minimized. The Tikhonov regularization (T3D-VAR), i.e.,^L ( x ) — 1 1 1 1 ^ led 
to a smoother result compared to the classic one with slightly better error statistics. However, result 
of the Huber regularization (H3D-VAR), i.e., ( x ) — l|Lx|| Hub , is the best. The rapidly varying noisy 
components are effectively damped in this regularization, while the sharp jump discontinuities have been 
preserved better than the T3D-VAR. The quantitative metrics in Table [3j indicate that in the analysis 
cycle, the RMSE and MAE metrics are improved dramatically, up to 85% in the H3D-VAR scheme. 



As previously explained, there is no unique and universally accepted methodology for automated selection 
of the regularization parameters, namely A and T. Here, to select the best parameters in the above 
assimilation examples, we performed a few trial and error experiments. In other words, over a feasible 
range of parameter values, we computed the solutions of the regularized assimilation methods and obtained 



the RMSE measure by comparing those solutions with the true initial condition xo- Figure [TO] shows the 
RMSE for different choices of regularization parameters in both T3D-VAR and H3D-VAR. Note that, the 
true initial condition is definitely not available in practice; however, here we picked the optimal values 
of the regularization parameters in the RMSE sense for comparison purposes and demonstrating the 
importance of a proper regularization. In the T3D-VAR, as expected, larger values of A typically damp 
rapidly varying error components of the noisy background and observation; however, they may give rise to 
an overly smooth solution with larger bias and RMSE (Figure [To^l). Here, for the T3D-VAR experiment, 
we picked the value A^ = 0.05 associated with the minimum RMSE (Figure [9]). In the H3D-VAR, in 
addition to the regularizer A# , we also need to choose the optimal thresholding value T of the Huber- 
norm. A contour plot of the RMSE values for different choices of A# and T is shown in Figure [TUfc . By 
inspection, we roughly picked \h =35 and T — 1.5e-3 for the H3D-VAR assimilation experiment presented 
in Figure |9| 

The main purpose of the DA process is, indeed, to increase the quality of the forecast. Given the initial- 
time analysis state, we can forecast the profile of the scalar quantity, x(s, £), at any future time step 
through the heat equation. One important property of the heat equation is its diffusivity. In other words, 
naturally, noisy components and rapidly varying perturbations are damped but become more correlated as 
the profile of the initial condition evolves. Thus, rapidly-varying uncorrelated error components become 
low-varying and correlated features which their detection and removal are naturally more difficult than 



Cycle 


RMSE 


MAE 




3D-VAR 


T3D-VAR 


H3D-VAR 


3D-VAR 


T3D-VAR 


H3D-VAR 


A 


0.0475 


0.0397 


0.0067 


0.0376 


0.0317 


0.0043 


F 


0.0090 


0.0088 


0.0043 


0.0071 


0.0070 


0.0033 



Table 3: The root mean squared error (RMSE) and the mean absolute error (MAE) for the studied classic 
and regularized 3D-VAR in the analysis cycle (A) and forecast step (F) . 
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Figure 10: (a) Root mean squared error (RMSE) of the implemented T3D-VAR as a function of the regu- 
larizer At- (b) RMSE surface for the H3D-VAR experiment with different choices of the regularizerA# and 
the threshold value T of the Huber-norm. Clearly, depending on the choice of the regularization method, the 
strength of the regularizer (magnitude of A) might be markedly different. 



the uncorrelated ones. Figure 11 1, shows the forecast profile at t = 10 [T], obtained by convolution of 
the initial profile with a Gaussian kernel having standard deviation of a = y/2 x 10. The results indicate 
the importance of using a proper regularization on the quality of the forecast in the simple heat equation. 
The forecasts based on the classic 3D-VAR and the T3D-VAR almost coincide while the regularized result 
is marginally better. This behavior arises because neither of those methods could properly eliminate the 
noisy features in the analysis cycle and hence low-varying error components are appeared in the forecast 
profile. However, the quality metrics in Table [3] indicate that using the H3D-VAR, the RMSE and MAE 
of the forecast are almost improved more than 50% compared to the other methods. 



6. Conclusion 



In this paper, we presented a new direction in approaching hydrometeorological estimation problems by 
taking into account important continuity and statistical properties of the underlying states such as the 
presence of sharp jumps, isolated singularities (i.e., local extremes) and statistical sparsity in the derivative 
space. We started by explaining the concept of regularization and discussed about the common points 
of the hydrometeorological problems of downscaling (DS), data fusion (DF), and data assimilation (DA) 
as discrete linear inverse problems. We argued about the importance of proper regularization which not 
only makes an inverse problem well-posed but also imposes the desired regularity and statistical property 
on the solution. Regularization methods were theoretically linked to the underlying statistical structure 
of the states and it was shown how the statistical information about the density of the state, or its 
derivative, can be used for proper selection of the regularization method. Specifically, we emphasized 
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Figure 11: (a) True forecast based on temporal evolution of the top-hat initial condition, Figure |9p, under 
the heat equation at t = 10 [T]. (b-c) Magnified windows showing the forecast quality using different 3D- 
VAR assimilation methods. Ineffective error removal by the classic 3D and T3D-VAR at the analysis cycle 
produced large-scale correlated error in the forecast profile, while this problem is less substantial in the result 
of the H3D-VAR (see, Table [3|. 

three types of regularization, namely, the Tikhonov, Zi-norm and Huber regularizations. We theoretically 
showed that these methods are statistically equivalent to the maximum a posteriori (MAP) estimator while 
respectively assuming the Gaussian, Laplace and Gibbs prior density for the state. It was argued that 
piece- wise continuity of the state and the presence of frequent jumps are often translated into heavy-tailed 
distributions in the derivative space that favor the use of /i-regularization. 

Informed by the non-Gaussian distribution of rainfall derivatives, the effectiveness of the regularized DS 
and DF problems was tested via analysis of remotely sensed precipitation fields. Then the performance 
of the regularized DA was studied via assimilating noisy observations into evolution of the heat equation. 
We showed that regularized assimilation methods outperform the classic 3D-VAR method, especially for 
the case where the initial condition exhibits a sparse distribution in the derivative space (e.g., first order 
derivative of the top-hat initial condition). 

The presented frameworks can be potentially applied to other hydrometeorological problems, such as 
soil moisture downscaling and fusion. Clearly, proper selection of the regularization method requires 
careful statistical analysis of the underlying state of interest. Moreover, the problem of rainfall or soil 
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moisture retrieval from satellite microwave radiance, can be considered as a non-linear inverse problem. 
This nonlinear inversion may be cast in the presented context, provided that the nonlinear kernel can be 
(locally) linearized with sufficient accuracy. Application of regularization in the DA problems is in its 
infancy [e.g.; see, Freitag et aZ., 2012, for a recent study] and is expected to play significant role over the 
next decades, especially for highly non-linear chaotic dynamical systems with frequent jumps. 
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Appendix 

A. Gradient Projection for Huber regularization 

Here, we present the gradient project (GP) method, using the Huber regularization, only for the down- 
scaling (DS) problem, which can be easily generalized to the data fusion (DF) and data assimilation (DA) 
cases. In case of the DS problem, the cost function and gradient of the Huber regularization with respect 
to the elements of the downscaled field are 

J(x) = i||y-Hx||^_ 1+ A||Lx|| Hub (A.l) 



VJ(x) = H T R _1 (y - Hx) + \L T p' T (Lx) , 



(A.2) 



where 



p' T (x) 




(A.3) 



As is evident, the cost function in (A.l ), is a smooth and convex function. Thus its minimum can be easily 



obtained using efficient first order gradient descent methods in large dimensional problems. However, 
rainfall is a positive process and in order to obtain a feasible downscaled field x, the regularized DS 
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problem needs to be solved on the non-negative orthant {x| x\ > Vi = 1, . . . , ra}, 

x = argmin {^(x)} 
s.t. x y 0. 



(A.4) 



We have used one of the primitive gradient projection (GP) methods to solve the above constrained DS 



problem [see, \Bertsekas\ |1999[ pp. 228]. Accordingly, to obtain the solution of ( |A.4[ ), it amounts to 
obtaining the fixed point of 

aVJ(x*)] + , (A.5) 



x* 



where a is a stepsize and 



x 



if x < 
otherwise, 



(A.6) 



denotes the Euclidean projection operator onto the non-negative orthant. As is evident, the fixed point 
can be obtained iteratively as 

Xfc+i = [xfe - OL k V J(x k )} + • (A.7) 

Thus, if the descent at step k is feasible (i.e., x& — afcVJ(xfc) y 0) the GP iteration becomes an ordinary 
unconstrained steepest descent method, otherwise the result is mapped back onto the feasible set by the 



projection operator in (A.6). In effect, the GP method finds iteratively the closest feasible point, in the 



Euclidean distance, to the solution of the original unconstrained minimization. 

In our study, the stepsize (a k ) was selected using the Armijo rule, or the so called backtracking line search, 
that is a convergent and very effective stepsize rule and depends on two constants < £ < 0.5 , < s < 1. 
In this method, the step size is assumed a k = <^ mfe , where m k is the smallest non-negative integer for which 



J (x fe - a k VJ(x k )) < J(x k ) - ^ fc VJ(x fc ) T VJ(x fc ) 



(A. 



In our DS examples the above backtracking parameters are set to £ = 0.2 and s = 0.5 [see, Boyd and 
Vandenberghe , 2004, pp.464 for more explanation]. In our coding, the iterations terminate if ^ Xfc -! jk < 

ll X fc — ill 9 



10 D or the number of iterations exceeds 200. 
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